Skip to content

feat(core): bump DataFusion to 49.0.1#1293

Merged
douenergy merged 6 commits intoCanner:mainfrom
goldmedal:feat/upgrade-datafusion-49.0.1
Aug 22, 2025
Merged

feat(core): bump DataFusion to 49.0.1#1293
douenergy merged 6 commits intoCanner:mainfrom
goldmedal:feat/upgrade-datafusion-49.0.1

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented Aug 21, 2025

Description

Summary by CodeRabbit

  • Bug Fixes
    • Preserve span/metadata during query rewrites; adapt literal/limit and datetime handling for stable parsing and planning.
  • Refactor
    • Simplified SQL translation paths, adjusted SQL formatting (join style, explicit casts, aliasing) and removed several analyzer/optimizer rule steps.
  • Chores
    • Bumped multiple dependencies and fixed repository URL; updated runtime dependency versions.
  • Tests
    • Updated numerous test expectations and snapshots, adjusted function-count expectations, and temporarily disabled a flaky view filter test.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Aug 21, 2025

Walkthrough

Adapts code to upstream DataFusion API/AST changes (Literal/ValueWithSpan/ObjectNamePart/FieldRef/ScalarFunctionArgs), preserves spans/metadata through logical-plan rewrites, adjusts MDL analyzer/optimizer rule pipelines and dialect translation, bumps workspace dependencies, and updates tests and sqllogictest expectations.

Changes

Cohort / File(s) Summary
AST shape & span preservation
wren-core-py/src/context.rs, wren-core/core/src/logical_plan/optimize/simplify_timestamp.rs, wren-core/core/src/logical_plan/utils.rs, wren-core/core/src/logical_plan/analyze/model_anlayze.rs, wren-core/core/src/mdl/dialect/inner_dialect.rs
Updated pattern matches to new AST shapes (e.g., Expr::Literal(_, _), ValueWithSpan); preserve spans/metadata when rewriting Subquery/Alias; adapt join field rename (null_equals_nullnull_equality).
MDL context: imports & rule pipelines
wren-core/core/src/mdl/context.rs
Changed CatalogProvider import path; ViewTable::try_newViewTable::new; removed several analyzer/optimizer rules from pipelines (ExpandWildcardRule, InlineTableScan, UnwrapCastInComparison).
UDF & datatypes API updates
wren-core/core/src/mdl/function.rs
Introduced/used FieldRef; changed scalar UDF API to invoke_with_args(ScalarFunctionArgs) and window UDF field to return FieldRef; adjusted return-field construction and tests (list element renamed item and made nullable).
Dialect refactor & SQL generation simplification
wren-core/core/src/mdl/dialect/wren_dialect.rs, wren-core/core/src/mdl/dialect/utils.rs
Removed custom SQL helpers for arrays/named_structs and now delegate to inner dialect; construct ObjectName using ObjectNamePart::Identifier(Ident); adjusted imports and identifier-quoting logic.
Workspace / manifests
wren-core/Cargo.toml, wren-core-py/Cargo.toml
Fixed malformed repository URL; bumped async-trait, datafusion branch, and tokio versions; bumped tokio in wren-core-py.
SQL generation & tests
wren-core/core/src/mdl/mod.rs, wren-core-py/tests/test_modeling_core.py, ibis-server/tests/conftest.py, ibis-server/tests/routers/v3/connector/local_file/test_functions.py
Updated snapshot/test expectations: explicit RIGHT OUTER JOIN, many CAST(... AS BIGINT) additions, alias/derived-projection changes; adjusted expected function counts (tests updated from 276→283 and 283→290 where applicable); test constants updated.
sqllogictest adjustments
wren-core/sqllogictest/test_files/view.slt
Commented-out a failing TR block, added a TODO about push_down_filter, preserved query/result as comments.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Frontend as SQL Frontend
  participant MDL as MDL Context
  participant Analyzer as Analyzer Pipeline
  participant Optimizer as Optimizer Pipeline
  participant DF as DataFusion AST

  User->>Frontend: submit SQL / request transformation
  Frontend->>MDL: build LogicalPlan (updated AST shapes)
  MDL->>Analyzer: analyze_rule_for_*()
  note right of Analyzer: analyzer pipeline changed (removed rules)
  Analyzer-->>MDL: analyzed LogicalPlan (spans/metadata preserved)
  MDL->>Optimizer: optimize_rule_for_unparsing()
  note right of Optimizer: optimizer pipeline changed (removed rules)
  Optimizer-->>MDL: optimized LogicalPlan
  MDL->>DF: translate to SQL/AST (uses ObjectNamePart, FieldRef, ValueWithSpan)
  DF-->>MDL: SQL/AST nodes
  MDL-->>Frontend: final SQL/plan
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • douenergy
  • wwwy3y3
  • onlyjackfrost

Poem

In burrows of bytes I nibble and hop,
Spans snug in pockets, no data will drop.
Rules trimmed like twigs, functions re-lined,
I hop through ASTs and tidy each bind.
Carrot of tests stashed — a rabbit-sized grind. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added core dependencies Pull requests that update a dependency file rust Pull requests that update Rust code python Pull requests that update Python code labels Aug 21, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (13)
wren-core/core/src/mdl/dialect/inner_dialect.rs (3)

185-188: Good adaptation to Expr::Literal(_, _) + minor simplification

Matching the two-field Literal for Utf8/LargeUtf8 is correct for DF 49.x. You can simplify the return to avoid wrapping an Ok around a ?.

-            | Expr::Literal(ScalarValue::LargeUtf8(Some(s)), _) => {
-                Ok(self.datetime_field_from_str(s)?)
-            }
+            | Expr::Literal(ScalarValue::LargeUtf8(Some(s)), _) => {
+                self.datetime_field_from_str(s)
+            }

195-231: Consider stricter validation for WEEK(...) parsing

The WEEK(…) parser accepts any trailing characters after the closing ')' and doesn’t validate unmatched parentheses robustly. Not a blocker, but it can reduce surprising inputs.

  • Ensure end == s.len() - 1 after finding ')'.
  • Validate that start < end and there’s no extra content outside WEEK(…).

251-254: Typo in local variable name

uppsercase should be uppercase. Purely cosmetic.

-fn non_uppercase(sql: &str) -> bool {
-    let uppsercase = sql.to_uppercase();
-    uppsercase != sql
-}
+fn non_uppercase(sql: &str) -> bool {
+    let uppercase = sql.to_uppercase();
+    uppercase != sql
+}
wren-core/core/src/logical_plan/utils.rs (2)

50-52: List child field aligns with DF 49; add a short note

Switching to Field::new("item", data_type, true) matches Arrow/DF conventions (child name “item”, elements nullable). Good change. Consider a short doc comment here to avoid future regressions.


169-171: Potentially invalid Arrow time type mapping

Mapping "time" to DataType::Time32(TimeUnit::Nanosecond) is unconventional. In Arrow, nanoseconds typically use Time64(Nanosecond); Time32 is generally for Second/Millisecond. Not related to this PR, but worth correcting to avoid downstream kernel incompatibilities and surprising behavior in parquet/IPC.

- "time" => DataType::Time32(TimeUnit::Nanosecond), // chose the smallest time unit
+ "time" => DataType::Time64(TimeUnit::Nanosecond),

If you prefer 32-bit, use Second or Millisecond.

wren-core/core/src/mdl/dialect/utils.rs (1)

58-63: Correct ObjectName construction under new AST

Wrapping Ident in ObjectNamePart::Identifier is the right adjustment for DF 49.0.1. One minor improvement: if you have spans available upstream, consider threading them instead of Span::empty() to preserve diagnostics.

wren-core/sqllogictest/test_files/view.slt (1)

19-23: Don’t silently comment out failing logic tests—gate or track

Understandable to disable while fixing push_down_filter on views. Please either:

  • Add a tracking issue and reference it here, or
  • Use a harness-supported skip/ignore mechanism (if available) instead of comments, so the test remains visible in reports.

I can help convert this to a gated/ignored test if the framework supports it.

wren-core/core/src/logical_plan/optimize/simplify_timestamp.rs (1)

146-154: Literal arm updated; consider adding tests for both Cast and TryCast

Updating to Expr::Literal(value, _) is correct. Consider adding unit tests that cover:

  • Cast/TryCast of timestamp literals across all four units
  • NamePreserver preserving aliases through simplification

I can scaffold tests under core/src/logical_plan/optimize/tests if helpful.

wren-core-py/src/context.rs (1)

251-252: Consider preserving numeric formatting semantics for newly introduced LIMIT.

You default the is flag to false for new limits. If your AST uses this flag to track formatting (e.g., integer vs decimal/long), consider setting it to a sensible default (likely false is fine) or making it configurable if downstream expects a specific representation.

wren-core/core/src/logical_plan/analyze/model_anlayze.rs (1)

715-726: Preserve Column spans when relation doesn’t belong to MDL.

In the non-MDL path, rewrite_column_qualifier currently rebuilds a Column with Spans::new(), which drops original spans while marking Transformed::no. Capture and pass through the original spans to avoid span loss.

Apply this minimal change to preserve spans without broad refactors:

-            Expr::Column(Column { relation, name, .. }) => {
-                if let Some(relation) = relation {
+            Expr::Column(Column { relation, name, spans, .. }) => {
+                if let Some(relation) = relation {
                     Ok(self.rewrite_column_qualifier(relation, name, alias_model))
                 } else {
@@
-                    Ok(Transformed::yes(ident))
+                    Ok(Transformed::yes(ident))
                 }
             }

and adjust rewrite_column_qualifier’s non-MDL branch to use the provided spans (pass it in or return Transformed::no with the original Column if available):

-        } else {
-            Transformed::no(Expr::Column(Column {
-                relation: Some(relation),
-                name,
-                spans: Spans::new(),
-            }))
-        }
+        } else {
+            // Prefer to return the original Expr::Column with its spans (if available)
+            Transformed::no(Expr::Column(Column {
+                relation: Some(relation),
+                name,
+                spans, // preserve existing spans
+            }))
+        }

If threading spans into rewrite_column_qualifier complicates the signature, alternatively decide the Transformed::no branch directly in map_column_and_rewrite_qualifier and return the original Expr::Column there.

wren-core/core/src/mdl/function.rs (2)

357-366: Window UDF field now returns FieldRef; derive return type from input_fields.

This matches the new API and avoids reliance on input_types. One suggestion: consider nullability propagation (e.g., SameAsInput should often inherit input nullability).

Example tweak:

-        Ok(Field::new(field_args.name(), return_type, false).into())
+        let nullable = match self.return_type {
+            ReturnType::SameAsInput => field_args.input_fields().first().map(|f| f.is_nullable()).unwrap_or(true),
+            _ => false,
+        };
+        Ok(Field::new(field_args.name(), return_type, nullable).into())

140-147: Support LargeList and FixedSizeList for SameAsInputFirstArrayElement.

Currently only List is handled; add LargeList and FixedSizeList to avoid not_impl_err when callers pass those array types.

-            ReturnType::SameAsInputFirstArrayElement => {
+            ReturnType::SameAsInputFirstArrayElement => {
                 if arg_types.is_empty() {
                     return not_impl_err!("No input type");
                 }
-                if let DataType::List(field) = &arg_types[0] {
-                    field.data_type().clone()
-                } else {
-                    return not_impl_err!("Input type is not array");
-                }
+                match &arg_types[0] {
+                    DataType::List(field) |
+                    DataType::LargeList(field) => field.data_type().clone(),
+                    DataType::FixedSizeList(field, _) => field.data_type().clone(),
+                    _ => return not_impl_err!("Input type is not array"),
+                }
             }
wren-core/core/src/mdl/dialect/wren_dialect.rs (1)

42-47: identifier_quote_style compiles Regex on every call; cache it.

Minor perf nit: compile the regex once statically.

For example:

-use regex::Regex;
+use regex::Regex;
+use once_cell::sync::Lazy;
+static IDENT_RE: Lazy<Regex> = Lazy::new(|| Regex::new(r"^[a-zA-Z_][a-zA-Z0-9_]*$").unwrap());
@@
-        let identifier_regex = Regex::new(r"^[a-zA-Z_][a-zA-Z0-9_]*$").unwrap();
-        if ALL_KEYWORDS.contains(&identifier.to_uppercase().as_str())
-            || !identifier_regex.is_match(identifier)
+        if ALL_KEYWORDS.contains(&identifier.to_uppercase().as_str())
+            || !IDENT_RE.is_match(identifier)
             || non_lowercase(identifier)

Ensure once_cell is available in the crate; otherwise use lazy_static.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 388a91a and 8a8389c.

⛔ Files ignored due to path filters (1)
  • wren-core-py/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • wren-core-py/src/context.rs (2 hunks)
  • wren-core/Cargo.toml (1 hunks)
  • wren-core/core/src/logical_plan/analyze/model_anlayze.rs (5 hunks)
  • wren-core/core/src/logical_plan/optimize/simplify_timestamp.rs (2 hunks)
  • wren-core/core/src/logical_plan/utils.rs (2 hunks)
  • wren-core/core/src/mdl/context.rs (2 hunks)
  • wren-core/core/src/mdl/dialect/inner_dialect.rs (1 hunks)
  • wren-core/core/src/mdl/dialect/utils.rs (2 hunks)
  • wren-core/core/src/mdl/dialect/wren_dialect.rs (2 hunks)
  • wren-core/core/src/mdl/function.rs (4 hunks)
  • wren-core/core/src/mdl/mod.rs (21 hunks)
  • wren-core/sqllogictest/test_files/view.slt (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
wren-core/core/src/mdl/context.rs (2)
wren-core/core/src/mdl/mod.rs (1)
  • catalog (294-296)
wren-core-base/manifest-macro/src/lib.rs (1)
  • view (347-366)
🪛 Gitleaks (8.27.2)
wren-core/core/src/mdl/mod.rs

2345-2345: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


2351-2351: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: test
  • GitHub Check: ci
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (win64)
  • GitHub Check: cargo check
  • GitHub Check: cargo test (macos)
🔇 Additional comments (22)
wren-core/core/src/logical_plan/utils.rs (1)

298-303: All Expr::Literal patterns confirmed updated
A repository-wide scan of every Expr::Literal usage shows all are now using two arguments ((_, _)), and there are no remaining single-argument patterns to update:

  • core/src/logical_plan/utils.rs: Expr::Literal(_, _)
  • core/src/logical_plan/optimize/simplify_timestamp.rs:
    • matches!(sub_expr.as_ref(), Expr::Literal(_, _))
    • Expr::Literal(value, _)
  • core/src/mdl/dialect/inner_dialect.rs: Expr::Literal(ScalarValue::Utf8(Some(s)), _) / Expr::Literal(ScalarValue::LargeUtf8(Some(s)), _)

No further changes are required.

wren-core/core/src/mdl/dialect/utils.rs (1)

21-21: Import of ObjectNamePart is required and correct

The import aligns with the new AST form for ObjectName segments. Looks good.

wren-core/core/src/logical_plan/optimize/simplify_timestamp.rs (1)

135-141: Literal shape check updated correctly

The guard matches!(sub_expr.as_ref(), Expr::Literal(_, _)) is the right fix for DF 49. Behavior unchanged. Nice.

wren-core-py/src/context.rs (1)

33-33: Good: switched to ValueWithSpan to keep literal spans intact.

Using ValueWithSpan in the import set aligns with the PR-wide span/metadata preservation. No concerns.

wren-core/Cargo.toml (2)

11-11: Repository URL fix looks good.


16-16: Bump async-trait to 0.1.89: compatible and non-breaking here.

wren-core/core/src/logical_plan/analyze/model_anlayze.rs (4)

104-108: Robust Subquery matching with .. to accommodate new fields.

Good adaptation for additional Subquery fields (e.g., spans). This prevents brittle matches.


446-447: Updated Join field null_equality: aligned with DataFusion v49.

Field rename addressed correctly; functional behavior preserved.


596-611: Preserve Subquery spans on rewrite.

Carrying spans through the Subquery transformation is consistent with the PR goal of span/metadata preservation.


731-740: Alias now preserves metadata: good.

Retaining metadata in Alias during expression rewrites avoids losing documentation or hints attached upstream.

wren-core/core/src/mdl/function.rs (3)

1-1: Switch to FieldRef: correct for DataFusion v49 Arrow types.

Importing FieldRef and using it where needed aligns with upstream type changes.


228-231: Updated ScalarUDFImpl to invoke_with_args: correct trait adaptation.

Returning an internal_err ensures the bypass UDF is never executed at runtime, which is the intended design.


604-605: Test update for list element Field(name="item", nullable=true) is correct.

Matches DF v49 semantics; good coverage.

wren-core/core/src/mdl/dialect/wren_dialect.rs (1)

21-21: Narrowed imports and reliance on ALL_KEYWORDS/Expr: cleaner surface.

wren-core/core/src/mdl/context.rs (4)

17-17: Import path correctly updated for DataFusion 49.0.1

The CatalogProvider import path change from datafusion::catalog_common to datafusion::catalog aligns with the upstream DataFusion 49.0.1 reorganization.


303-303: ViewTable constructor change handled correctly

The switch from ViewTable::try_new to ViewTable::new is appropriate since the new API no longer returns a Result. This aligns with DataFusion 49.0.1 API changes where view creation is now infallible.


169-187: Simplified analyzer rules for local runtime

The removal of ExpandWildcardRule and InlineTableScan analyzer rules appears intentional based on the PR context. These rules are likely now handled differently in DataFusion 49.0.1 or are no longer necessary for the local runtime mode.


220-260: Ignore incorrect assertion about UnwrapCastInComparison removal

A search of the current codebase shows no references to UnwrapCastInComparison, so it was never part of the unparser pipeline and hasn’t been removed. The suggestion to verify its replacement isn’t applicable here.

Likely an incorrect or invalid review comment.

wren-core/core/src/mdl/mod.rs (4)

695-695: Test assertions updated for new SQL generation behavior

The test expectations now include explicit CAST(column AS BIGINT) for numeric comparisons and RIGHT OUTER JOIN instead of RIGHT JOIN. These changes reflect DataFusion's more explicit type casting in the generated SQL for better cross-database compatibility.

Also applies to: 707-707, 1173-1173


1739-1739: Numeric predicates now include explicit type casting

The generated SQL now includes CAST(column AS BIGINT) for integer comparisons (e.g., CAST(customer.c_custkey AS BIGINT) = 1). This ensures type safety and consistency across different database backends.

Also applies to: 1774-1774, 1823-1826, 1874-1877


2345-2345: False positive: Test data strings are not API keys

The static analysis tool incorrectly flagged the test strings 'Gura' and session_user as potential API keys. These are clearly test data values used in row-level access control testing scenarios.

Also applies to: 2351-2351


1173-1173: UNNEST test output includes new aliasing structure

The UNNEST query now generates a derived_projection alias with explicit column naming. This change improves SQL clarity and ensures proper column referencing in complex queries.

@github-actions github-actions bot added the ibis label Aug 22, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
ibis-server/tests/conftest.py (1)

14-14: Hard-coded function count updated to 283 — consider deriving dynamically to reduce brittleness

This constant tends to change with DataFusion bumps. If feasible, derive the baseline count at runtime (e.g., via an API call or a helper in tests) or centralize it in a single test utility to avoid manual updates in multiple places.

If you keep the constant, ensure all tests referencing DataFusion function counts have been updated consistently (search for “len(functions) ==” and “DATAFUSION_FUNCTION_COUNT” across the repo).

wren-core-py/tests/test_modeling_core.py (1)

139-139: Replace brittle exact counts with a dynamic calculation based on union of built-ins and CSV-defined functions

Exact numbers will drift with DataFusion releases. Compute the expected length as the union of:

  • base registry functions (no CSV)
  • names in tests/functions.csv

This keeps the assertion stable while still verifying the merge logic.

Apply this diff within the current test to remove hard-coded counts:

@@
-    functions = session_context.get_available_functions()
-    assert len(functions) == 290
+    functions = session_context.get_available_functions()
+    # Dynamically derive expected count: DataFusion registry ∪ CSV custom functions
+    base_functions = SessionContext(manifest_str, None).get_available_functions()
+    base_names = {f.name for f in base_functions}
+    import csv  # local import to keep test module scope clean
+    with open(path, newline="") as f:
+        csv_names = {row["name"] for row in csv.DictReader(f) if row.get("name")}
+    assert len(functions) == len(base_names | csv_names)
@@
-    functions = session_context.get_available_functions()
-    assert len(functions) == 283
+    functions = session_context.get_available_functions()
+    assert len(functions) == len(base_names)

If you prefer a dedicated helper, I can factor this into a small utility used by both this test and ibis-server so future bumps don’t require manual edits. Do you want me to open a follow-up PR for that?

Also applies to: 151-151

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8a8389c and 32c0bdf.

⛔ Files ignored due to path filters (1)
  • wren-core-py/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • ibis-server/tests/conftest.py (1 hunks)
  • wren-core-py/Cargo.toml (1 hunks)
  • wren-core-py/src/context.rs (2 hunks)
  • wren-core-py/tests/test_modeling_core.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • wren-core-py/src/context.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: cargo test (amd64)
  • GitHub Check: ci
  • GitHub Check: cargo test (win64)
🔇 Additional comments (1)
wren-core-py/Cargo.toml (1)

22-22: Verify Tokio version & feature consistency in wren-core workspace

Please confirm the following based on the audit of all tokio declarations:

  • In wren-core/Cargo.toml (the workspace root):
    • tokio = { version = "1.46", features = ["rt", "rt-multi-thread", "macros"] }
  • In wren-core/core/Cargo.toml:
    • tokio = { workspace = true, features = ["rt", "rt-multi-thread", "macros"] }
  • In wren-core/benchmarks/Cargo.toml and wren-core/wren-example/Cargo.toml:
    • tokio = { workspace = true } (inherits features from the workspace root)
  • In wren-core/sqllogictest/Cargo.toml:
    • Under regular [dependencies]: tokio = { workspace = true } (inherits all features)
    • Under [[test]] deps: tokio = { workspace = true, features = ["rt-multi-thread"] }
      Note: this overrides the workspace’s full feature set and omits "macros". If your tests use #[tokio::test] or any Tokio macros, you’ll need to add "macros" here.

Separately, wren-core-py/Cargo.toml is not part of that workspace—it pins tokio = "1.46.0" with default features. If you intend for the Python-binding crate to share exactly the same version and feature flags, you can either add it to the workspace and switch to workspace = true or continue managing its dependency explicitly.

Optional refactoring recommendation:

  • Centralize the Tokio dependency under a [workspace.dependencies] section in the workspace-root wren-core/Cargo.toml (e.g. tokio = { version = "1.46.0", features = ["rt", "rt-multi-thread", "macros"] }). This will ensure all member crates (including tests) stay in sync without per-crate repetition.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
ibis-server/tests/routers/v3/connector/local_file/test_functions.py (2)

56-58: Fix stale inline comment to match the new expectation (429 → 427).

The inline comment still says 429 while the assertion expects 427. This will confuse future readers.

Apply this diff:

-    # 429 is the number of functions in `resources/function_list/duckdb.csv` file excluded the default functions in DataFusion.
-    assert len(result) == DATAFUSION_FUNCTION_COUNT + 427
+    # 427 is the number of functions in resources/function_list/duckdb.csv, excluding the default DataFusion functions.
+    assert len(result) == DATAFUSION_FUNCTION_COUNT + 427

52-58: Make the size assertion data-driven to avoid churn on every DataFusion bump.

Right now the test hard-codes “427”. You can compute the delta by comparing the CSV function names against the base function names returned by the API (with remote list unset). This keeps the test stable when DataFusion adds/removes built-ins or when the CSV changes.

Proposed refactor within this block:

-    config.set_remote_function_list_path(function_list_path)
-    response = await client.get(url=f"{base_url}/functions")
-    assert response.status_code == 200
-    result = response.json()
-    # 429 is the number of functions in `resources/function_list/duckdb.csv` file excluded the default functions in DataFusion.
-    assert len(result) == DATAFUSION_FUNCTION_COUNT + 427
+    # Capture the base set of function names from DataFusion alone
+    base_names = {f["name"] for f in result}
+    base_len = len(base_names)
+
+    # Enable remote function list and fetch again
+    config.set_remote_function_list_path(function_list_path)
+    response = await client.get(url=f"{base_url}/functions")
+    assert response.status_code == 200
+    result = response.json()
+
+    # Compute expected delta from the CSV by excluding names already present in DataFusion
+    csv_names = _read_duckdb_function_names(function_list_path)
+    expected_delta = len(csv_names - base_names)
+    assert len(result) == base_len + expected_delta

Add this small helper (outside this function, anywhere in the module), and update imports accordingly:

import csv
import os

def _read_duckdb_function_names(function_list_dir: str) -> set[str]:
    csv_path = os.path.join(function_list_dir, "duckdb.csv")
    with open(csv_path, newline="", encoding="utf-8") as f:
        return {row["name"] for row in csv.DictReader(f)}

This mirrors the server’s de-duplication by name and removes the magic number.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 32c0bdf and 2babd27.

📒 Files selected for processing (1)
  • ibis-server/tests/routers/v3/connector/local_file/test_functions.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: cargo test (macos)
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (win64)
  • GitHub Check: cargo check
  • GitHub Check: test
  • GitHub Check: ci

@goldmedal goldmedal requested a review from douenergy August 22, 2025 03:39
@douenergy
Copy link
Copy Markdown
Contributor

Thanks @goldmedal

@douenergy douenergy merged commit 92a1c7d into Canner:main Aug 22, 2025
15 checks passed
nhaluc1005 pushed a commit to nhaluc1005/text2sql-practice that referenced this pull request Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core dependencies Pull requests that update a dependency file ibis python Pull requests that update Python code rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants